90 research outputs found
Benchmarking unsupervised near-duplicate image detection
Unsupervised near-duplicate detection has many practical applications ranging from social media analysis and web-scale retrieval, to digital image forensics. It entails running a threshold-limited query on a set of descriptors extracted from the images, with the goal of identifying all possible near-duplicates, while limiting the false positives due to visually similar images. Since the rate of false alarms grows with the dataset size, a very high specificity is thus required, up to 1-10^-9 for realistic use cases; this important requirement, however, is often overlooked in literature. In recent years, descriptors based on deep convolutional neural networks have matched or surpassed traditional feature extraction methods in content-based image retrieval tasks. To the best of our knowledge, ours is the first attempt to establish the performance range of deep learning-based descriptors for unsupervised near-duplicate detection on a range of datasets, encompassing a broad spectrum of near-duplicate definitions. We leverage both established and new benchmarks, such as the Mir-Flick Near-Duplicate (MFND) dataset, in which a known ground truth is provided for all possible pairs over a general, large scale image collection. To compare the specificity of different descriptors, we reduce the problem of unsupervised detection to that of binary classification of near-duplicate vs. not-near-duplicate images. The latter can be conveniently characterized using Receiver Operating Curve (ROC). Our findings in general favor the choice of fine-tuning deep convolutional networks, as opposed to using off-the-shelf features, but differences at high specificity settings depend on the dataset and are often small. The best performance was observed on the MFND benchmark, achieving 96% sensitivity at a false positive rate of 1.43x10^-6
On the Use of Causal Models to Build Better Datasets
In recent years, Machine Learning and Deep Learning communities have devoted many efforts to studying ever better models and more efficient training strategies. Nonetheless, the fundamental role played by dataset bias in the final behaviour of the trained models calls for strong and principled methods to collect, structure and curate datasets prior to training. In this paper, we provide an overview of the use of causal models to achieve a deeper understanding of the underlying structure beneath datasets and mitigate biases, supported by several real-life use cases from the medical and industrial domains
CLASSIFICATION OF TAGGED MATERIAL IN A SET OF TOMOGRAPHIC IMAGES OF COLORECTAL REGION
method of classification of image portions corresponding to faecal residues from a tomographic image of a colorectal region, which comprises a plurality of voxels (2) each having a predetermined intensity value and which shows at least one portion of colon (6a, 6b, 6c, 6d) comprising at least one area of tagged material (10). The area of tagged material (10) comprises at least one area of faecal residue (10a) and at least one area of tissue affected by tagging (10b). The image further comprises at least one area of air (8) which comprises an area of pure air (8a) not influenced by the faecal residues. The method comprises the operations of identifying (100), on the basis of a predetermined identification criterion based on the intensity values, above-threshold connected regions comprising connected voxels (2) and identifying, within the above-threshold connected regions, a plurality of connected regions of tagged material comprising voxels (2) representing the area of tagged material (10). The method further comprises the operation of classifying (104) each plurality of connected regions of tagged material on the basis of specific classification comparison criteria for each connected region, in such a way as to identify voxels (20) corresponding to the area of faecal residue (10a) and voxels (2) corresponding to the area of tissue affected by tagging (10b)
SoccER: Computer graphics meets sports analytics for soccer event recognition
Automatic event detection from images or wearable sensors is a fundamental step towards the development of advanced sport analytics and broadcasting software. However, the collection and annotation of large scale sport datasets is hindered by technical obstacles, cost of data acquisition and annotation, and commercial interests. In this paper, we present the Soccer Event Recognition (SoccER) data generator, which builds upon an existing, high quality open source game engine to enable synthetic data generation. The software generates detailed spatio-temporal data from simulated soccer games, along with fine-grained, automatically generated event ground truth. The SoccER software suite includes also a complete event detection system entirely developed and tested on a synthetic dataset including 500 minutes of game, and more than 1 million events.
We close the paper by discussing avenues for future research in sports event recognition enabled by the use of synthetic data
Method of classification of tagged material in a set of tomographic images of colorectal region
A method of classification of image portions corresponding to fecal residues from a tomographic image of a colorectal region, which comprises a plurality of voxels (2) each having a predetermined intensity value and which shows at least one portion of colon (6 a, 6 b, 6 c, 6 d) comprising at least one area of tagged material (10). The area of tagged material (10) comprises at least one area of fecal residue (10 a) and at least one area of tissue affected by tagging (10 b). The image further comprises at least one area of air (8) which comprises an area of pure air (8 a) not influenced by the fecal residues. The method comprises the operations of identifying (100), on the basis of a predetermined identification criterion based on the intensity values, above-threshold connected regions comprising connected voxels (2) and identifying, within the above-threshold connected regions, a plurality of connected regions of tagged material comprising voxels (2) representing the area of tagged material (10). The method further comprises the operation of classifying (104) each plurality of connected regions of tagged material on the basis of specific classification comparison criteria for each connected region, in such a way as to identify voxels (20) corresponding to the area of fecal residue (10 a) and voxels (2) corresponding to the area of tissue affected by tagging (10 b)
Breast mass detection with faster R-CNN: On the feasibility of learning from noisy annotations
In this work we study the impact of noise on the training of object detection networks for the medical domain, and how it can be mitigated by improving the training procedure. Annotating large medical datasets for training data-hungry deep learning models is expensive and time consuming. Leveraging information that is already collected in clinical practice, in the form of text reports, bookmarks or lesion measurements would substantially reduce this cost. Obtaining precise lesion bounding boxes through automatic mining procedures, however, is difficult. We provide here a quantitative evaluation of the effect of bounding box coordinate noise on the performance of Faster R-CNN object detection networks for breast mass detection. Varying degrees of noise are simulated by randomly modifying the bounding boxes: in our experiments, bounding boxes could be enlarged up to six times the original size. The noise is injected in the CBIS-DDSM collection, a well curated public mammography dataset for which accurate lesion location is available. We show how, due to an imperfect matching between the ground truth and the network bounding box proposals, the noise is propagated during training and reduces the ability of the network to correctly classify lesions from background. When using the standard Intersection over Union criterion, the area under the FROC curve decreases by up to 9%. A novel matching criterion is proposed to improve tolerance to noise
Immersive Virtual Reality-Based Interfaces for Character Animation
Virtual Reality (VR) has increasingly attracted the attention of the computer animation community in search of more intuitive and effective alternatives to the current sophisticated user interfaces. Previous works in the literature already demonstrated the higher affordances offered by VR interaction, as well as the enhanced spatial understanding that arises thanks to the strong sense of immersion guaranteed by virtual environments. These factors have the potential to improve the animators' job, which is tremendously skill-intensive and time-consuming. The present paper explores the opportunities provided by VR-based interfaces for the generation of 3D animations via armature deformation. To the best of the authors' knowledge, for the first time a tool is presented which allows users to manage a complete pipeline supporting the above animation method, by letting them execute key tasks such as rigging, skinning and posing within a well-known animation suite using a customizable interface. Moreover, it is the first work to validate, in both objective and subjective terms, character animation performance in the above tasks and under realistic work conditions involving different user categories. In our experiments, task completion time was reduced by 26%, on average, while maintaining almost the same levels of accuracy and precision for both novice and experienced users
PROTOtypical Logic Tensor Networks (PROTO-LTN) for Zero Shot Learning
Semantic image interpretation can vastly benefit from approaches that combine sub-symbolic distributed representation learning with the capability to reason at a higher level of abstraction. Logic Tensor Networks (LTNs) are a class of neuro-symbolic systems based on a differentiable, first-order logic grounded into a deep neural network. LTNs replace the classical concept of training set with a knowledge base of fuzzy logical axioms. By defining a set of differentiable operators to approximate the role of connectives, predicates, functions and quantifiers, a loss function is automatically specified so that LTNs can learn to satisfy the knowledge base. We focus here on the subsumption or isOfClass predicate, which is fundamental to encode most semantic image interpretation tasks. Unlike conventional LTNs, which rely on a separate predicate for each class (e.g., dog, cat), each with its own set of learnable weights, we propose a common isOfClass predicate, whose level of truth is a function of the distance between an object embedding and the corresponding class prototype. The PROTOtypical Logic Tensor Networks (PROTO-LTN) extend the current formulation by grounding abstract concepts as parametrized class prototypes in a high-dimensional embedding space, while reducing the number of parameters required to ground the knowledge base. We show how this architecture can be effectively trained in the few and zero-shot learning scenarios. Experiments on Generalized Zero Shot Learning benchmarks validate the proposed implementation as a competitive alternative to traditional embedding-based approaches. The proposed formulation opens up new opportunities in zero shot learning settings, as the LTN formalism allows to integrate background knowledge in the form of logical axioms to compensate for the lack of labelled examples
Bridging the gap between Natural and Medical Images through Deep Colorization
Deep learning has thrived by training on large-scale datasets. However, in
many applications, as for medical image diagnosis, getting massive amount of
data is still prohibitive due to privacy, lack of acquisition homogeneity and
annotation cost. In this scenario, transfer learning from natural image
collections is a standard practice that attempts to tackle shape, texture and
color discrepancies all at once through pretrained model fine-tuning. In this
work, we propose to disentangle those challenges and design a dedicated network
module that focuses on color adaptation. We combine learning from scratch of
the color module with transfer learning of different classification backbones,
obtaining an end-to-end, easy-to-train architecture for diagnostic image
recognition on X-ray images. Extensive experiments showed how our approach is
particularly efficient in case of data scarcity and provides a new path for
further transferring the learned color information across multiple medical
datasets
Faster-LTN: a neuro-symbolic, end-to-end object detection architecture
The detection of semantic relationships between objects represented in an
image is one of the fundamental challenges in image interpretation.
Neural-Symbolic techniques, such as Logic Tensor Networks (LTNs), allow the
combination of semantic knowledge representation and reasoning with the ability
to efficiently learn from examples typical of neural networks. We here propose
Faster-LTN, an object detector composed of a convolutional backbone and an LTN.
To the best of our knowledge, this is the first attempt to combine both
frameworks in an end-to-end training setting. This architecture is trained by
optimizing a grounded theory which combines labelled examples with prior
knowledge, in the form of logical axioms. Experimental comparisons show
competitive performance with respect to the traditional Faster R-CNN
architecture.Comment: accepted for presentation at ICANN 202
- …